Speech Parameter Sequence Modeling with Latent Trajectory Hidden Markov Model

نویسنده

  • Hirokazu Kameoka
چکیده

The weakness of hidden Markov models (HMMs) is that they have difficulty in modeling and capturing the local dynamics of feature sequences due to the piecewise stationarity assumption and the conditional independence assumption on feature sequences. Traditionally, in speech recognition systems, this limitation has been circumvented by appending dynamic (delta and delta-delta) components to the feature vectors. HMM-based speech synthesis systems [1] also use the joint vector of static and dynamic features as an observed vector in the training process. In the synthesis process, on the other hand, a sequence of static features is generated according to the output probabilities of the trained HMM given an input sentence by taking account of the explicit constraint between the static and dynamic features [2]. Although the HMM-based speech synthesis framework has many attractive features, one drawback is that the criteria used for training and synthesis are inconsistent. While the joint likelihood of static and dynamic features is maximized during the training process, the likelihood of only the static features is maximized during the synthesis process. This implies that the model parameters are not trained in such a way that the generated parameter sequences become optimal. To address this problem, Zen [3] introduced a variant of HMM called the “trajectory HMM,” which was obtained by incorporating the explicit relationship between static and dynamic features into the traditional HMM. This has made it possible to provide a unified framework for the training and synthesis of speech parameter sequences, however, it causes difficulty as regards parameter inference. Since the conditional independence assumption on the feature vectors is lost, efficient algorithms for training and decoding regular HMMs such as the Viterbi algorithm and the Forward-Backward algorithm are no longer applicable to the trajectory HMM. Thus, some approximations and brute-force methods are usually necessary to obtain training and decoding algorithms [3, 4]. In this paper, we propose formulating a new model called the “latent trajectory HMM.” In contrast with the conventional trajectory HMM, the present model splits the generative process of an observed feature sequence into two processes, one for a sequence of the joint vectors of static and dynamic features given HMM states and the other for an observed feature sequence given the sequence of the joint vectors. By treating the joint vector of static and dynamic features as a latent variable to be marginalized out, we obtain a probability density function of an observed feature sequence with a dif-

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An introduction of trajectory model into HMM-based speech synthesis

In the synthesis part of a hidden Markov model (HMM) based speech synthesis system which we have proposed, a speech parameter vector sequence is generated from a sentence HMM corresponding to an arbitrarily given text by using a speech parameter generation algorithm. However, there is an inconsistency: although the speech parameter vector sequence is generated under the constraints between stat...

متن کامل

Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...

متن کامل

Speech-driven lip motion generation with a trajectory HMM

Automatic speech animation remains a challenging problem that can be described as finding the optimal sequence of animation parameter configurations given some speech. In this paper we present a novel technique to automatically synthesise lip motion trajectories from a speech signal. The developed system predicts lip motion units from the speech signal and generates animation trajectories autom...

متن کامل

Modeling trajectories in the HMM framework

Most state-of-the-art statistical speech recognition systems use hidden Markov models (HMM) for modeling the speech signal. However, limited by the assumption of conditional independence of observations given the state sequence, current HMM's poorly model the trajectory constraints in speech. In [1], we introduced the parallel path HMM, where each phonetic unit is represented by a parallel coll...

متن کامل

Parametric subspace modeling of speech transitions

This report describes an attempt at capturing segmental transition information for speech recognition tasks. The slowly varying dynamics of spectral trajectories carries much discriminant information that is very crudely modelled by traditional approaches such as HMMs. In approaches such as recurrent neural networks there is the hope, but not the convincing demonstration, that such transitional...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015